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PROCOLLAGEN ASSEMBLY 



The present invention relates to a method of regulating assembly of 
procollagens and deriv atives thereof. 

Most cells, whether simple unicellular organisms or cells from human tissue, 
are surrounded by an intricate network of macromolecules which is known as the 
extracellular matrix and which is comprised of a variety of proteins and 
polysaccharides. The major protein component of this matrix is a family of related 
proteins called the collagens which are thought to constitute approximately 25% of 
total proteins in mammals. There are at least 20 genetically distinct types of collagen 
molecule, some of which are known as fibrillar collagens (collagen types I. II. III. V 
and XI) because they typically form large fibres, known as collagen fibrils, that may 
be many mircometers long and may be visualised by electron microscopy. 

Collagen fibrils are comprised of polymers of collagen molecules and are 
produced by a process which involves conversion of procollagen to collagen 
molecules which then assemble to form the polymer. Procollagen consists of a triple 
stranded helical domain in the centre of the molecule and has non-helical regions at 
the amino terminal (known as the N-terminal propeptide) and at the carboxy terminal 
(known as the C-terminai propeptide). The triple stranded helical domain is made up 
of three polypeptides which are known as u chains. Procollagen is synthesised 
intracellularly from pro-u chains (u chains with N- and C-terminal propeptide 
domains) on membrane-bound ribosomes following which the pro-u chains are 
inserted into the endoplasmic reticulum. 

Within the endoplasmic reticulum the pro-u chains are assembled into 
procollagen molecules. This assembly can be divided into two stages: an initial 
recognition event between the pro-u chains which determines chain selectively and 
then a registration event which leads to correct alignment of the triple helix. 



Procollagen assembly is initiated by association of the^C-terminal propeptide domains 
of each pro-a chain to form the C- terminal propeptide. Assembly of the triple helix 
domain then proceeds in a C- to N- terminal direction and is completed by formation 
of the N- terminal propeptide. The mature procollagen molecules are ultimately 
secreted into the extracellular environment where they are converted into collagen by 
the action of Procollagen N-Proteinases (which cleave the N-terminal propeptide) and 
Procollagen C-Proteinases (which cleave the C-termmal propeptide). Once the 
propeptides have been removed the collagen molecules thus formed are able to 
aesreizate spontaneously to form the collagen fibrils. 

Collaaens have many uses industrially. For instance. Collagen gels can be 
formed from collagen fibrils in vitro and may be used to support cell attachment. Such 
eels mav be used in cell culture to maintain the phenotype of certain cells, such as 
chondrocytes explanted from cartilage. Collagen may be also used as a "sniffer" or 
packing agent surgically and is particularly known to be used in cosmetic surgery, for 
enlarging the appearance of Hps for instance. In vivo, collagen is a major component 
of the extracellular matrix and serves a multitude of purposes. Numerous diseases are 
known which involve abnormalities in collagen synthesis and regulation. Procollagens 
and derivatives thereof may be used (or be of potential use) for the treatment of these 
diseases. 

Large quantities of procollagens or derivatives thereof need to be synthesised 
to meet increasing industrial demand. A convenient means of synthesising 
procollagens or derivatives thereof is by expression of exogenous pro-a chains in a 
host cell followed by the assembly of pro-a chains into the procollagen or derivative 
thereof. For this to occur it is necessary to ensure that any host cell used has the 
necessary post-translational facilities required to assemble procollagens from pro-a 
chains. This may be achieved by expression in cells which normally synthesise 
procollagen. However one problem in such systems is that endogenously expressed 
pro-a chains can co-assemble with the exogenously introduced pro-a chains giving 
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(i.e. the C- terminal propeptide domain of the manipulated or selected gene is so 
designed that association between pro-u chains expressed from such a gene and 
association between at least one further procollagen or denvame thereof is mutual!) 
exclusive). 

It is preferred that the exogenous selection from natural pro-u chains or 
exogenous manipulation of gene or genes is effected by using recombinant DNA 
techniques to generate DNA molecules which may be expressed in the system of the 
method of the invention. Therefore, according to a second aspect of the invention, 
there is provided a DNA molecule encoding for an exogenousK manipulated or 
selected pro-a chain or derivative thereof w ith a C- terminal propeptide domain w hich 
will not co-assemble with the C- terminal propeptide domain of the pro-u chains or 
derivatives thereof of at least one further procollagen or derivative thereof. 

The DNA molecule may be incorporated within a suitable vector to form a 
recombinant vector. The vector may for example be a plasmid. cosmid or phage. 
Such vectors will frequently include one or more selectable markers to enable 
selection of cells transfected with the said vector and. preferably, to enable selection 
of cells harbouring the recombinant vectors that incorporate the DNA molecule 
according to the second aspect of the invention. 

For expression of pro-u chains or derivatives thereof the vectors should be 
expression vectors and have regulator) sequences to drive expression of the DNA 
molecule. Vectors not including such regulatory sequences ma> also be used during 
the preparation of the DNA molecule and are useful as cloning vectors for the 
purposes of replicating the DNA molecule. When such \ectors are used the DNA 
molecule will ultimately be required to be transferred to a suitable expression vector 
which may be used for production of the pro-u chains or derivatives thereof 
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The system in which the exogenously selected pro-a chain(s) or exogenously 
manipulated gene or genes of the method of the invention may he expressed and 
assembled into procollagen or derivatives thereof in a cell free in vitro system. 
However it is preferred that the system is a host cell which has been transfected with a 
DN A molecule according to the second aspect of the invention. Such host cells may 
be prokarvotic or eukaryotic. Eukaryotic hosts may include yeasts, insect and 
mammalian cells. Hosts used for expression of the protein encoded by the DNA 
molecule are ideally stably transformed, although the use of unstably transformed 
(transient) hosts is not precluded. 

Alternatively a host cell system may involve the DNA molecule being 
incorporated into a transgene construct which is expressed in a transgenic plant or. 
preferably, animal. Transgenic animals which may be suitably formed for expression 
of such transgene constructs, include birds such as domestic fowl, amphibian species 
and fish species. Procollagens or derivatives thereof and or collagen polymers 
formed therefrom may be harvested from body fluids or other body products (such as 
engs. where appropriate). Preferred transgenic animals are (non-human) mammals, 
particularly placental mammals. An expression product of the DNA molecule of the 
invention may be expressed in the mammary gland of such mammals and the 
expression product may subsequently be recovered from the milk. Ungulates, 
particularly economically important ungulates such as cattle, sheep, goats, water 
buffalo, camels and pigs are most suitable placental mammals for use as transgenic 
animals according to the invention. Equally the transgenic animal could be a human in 
which case the expression of the pro-a chains or derivative thereof in such a person 
could be a suitable means of effecting gene therapy. 

Host cells and particularly transgenic plants or animals, may contain other 
exogenous DNA. the expression of which facilitates the expression, assembly, 
secretion or other aspects of the biosynthesis of procollagen and derivatives thereof 
and even collagen polymers formed therefrom. For example, host cells and transgenic 



plants or animals may also be manipulated to eo-expre** prolyl 4-ln droxylase. which 
is a post translation enzyme important in the natural biosynthesis of procollagens, as 
disclosed in WO-A-^307884. 

DNA. particularly cDNA. encoding natural pro-<x chains is known and 
available in the art. For example. \VO-A-<>3(>788'>. \VO-A-O41ft570 and the 
references cited in both of them give details. Such DNA may be used as a convenient 
starting point for making a DNA molecule that encodes for an exogenously 
manipulated gene according to the first aspect of the invention. Recombinant 
techniques may be used to derive the DNA molecule of the invention from such a 
starting point. 

DNA sequences. cDNAs. full genomic sequences and min.genes (genomic 
sequences containing some, but not all. of the introns present in the full length gene, 
may be inserted by recombinant means into a DNA sequence coding for naturally 
occurring pro-a chains (such as the starting point DNA mentioned above) to form the 
DNA molecule that encodes for an exogenously manipulated gene according to the 
first aspect of the invention.. Because of the large number of introns present in 
collagen genes in general, experimental practicalities will usually favour the use of 
cDNAs or. in some circumstances, min.genes. The inserted DNA sequences. cDNAs. 
full genomic sequences or minigenes code for amino acids which give rise to pro-u 
chains or derivative thereof with a C - terminal propeptide domain which will not co- 
assemble with the C- terminal propeptide domain of the pro-./, chains or derivatives 
thereof that assemble to form the said at least one further procollagen or derivative 
thereof. 

Preferred exogenous manipulations of the gene or genes involve alteration of 
the recognition sequence within the C- terminal propeptide domain which is 
responsible for selective association of pro-u chains such that an> P ro-u chain or 
derivative thereof expressed from the manipulated gene will not undesirably co- 
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assemble with pro-a chains endogenously expressedjjom a host cell into which the 
exogenously manipulated gene or genes is or are introduced. 

Alternatively the gene or genes may be selected from naturally occurring 
genes such that the recognition sequence within the C- terminal propeptide domain 
which is responsible lor selective association of pro-a chains such that any pro-a 
chain expressed from the selected gene will not undesirably co-assemble with pro-a 
chains endogenously expressed from the host cell into which the gene or genes is or 
are introduced. 

Equally the gene or genes may be selected or exogenously manipulated such 
that the pro-a chains or derivatives thereof expressed from the gene or genes w ill not 
undesirably co-assemble with other pro-a chains used in a cell free in vitro system. 

In our previous application PCT GB96 02 122 we disclosed novel molecules 
comprising combinations of natural or novel C- terminal propeptide domains with 
alien a chains (or a non-collagen material). PCT GB%/02 1 22 also disclosed DNA 
molecules encoding such molecules. These DNA molecules are preferred molecules 
of the second aspect of the current invention and may be used according to the 
methods of the current invention. Such molecules disclosed in PCT GB96 02122 are 
incorporated herein by reference. 

In PCT/GB96/02122 the inventors disclosed that they had determined that 
specific regions within the C- terminal propeptide were the recognition sequences 
involved in the specificity of association between C-terminal propeptide domains of 
pro-a chains during the formation of procollagens. These recognition sequences were 
identified as having the following amino acid sequences for each respective pro-a 
chain: 



pro-al(I) GGQGSDPADV AIQLTFLRLM STE 



pro-tx2 (I) NVEGYTSKEM ATQLAFMRi.L ANY 

pro-al (II) GDDNF APNEA NYQMTFLRLL STF 

pro-al (III) GNPELPEDYE DYQEAFLRLL SSR 

pro-al (V) YDAEGNPVGY .YQMTFERLL SAS 

pro-cx2 (V) GDHQSPNTAI .TQMTFLRLL SkE 

pro-ul (XI) LDYEGNSINM A'QMTFLKLL TAS 

pro-a2 (XI) YDSEGSPYGY A'OETFFREE SVS 



These recognition sequences confer selectivity and specificity of pro-<x chain 
association. The DNA encoding tor these sequences may be substituted for the DNA 
encoding recognition sequences found in natural or artificially constructed pro-tx 
chain genes to form preferred DNA molecules of the invention. Alternatively deletion, 
addition or substitution mutations may be made within the DNA encoding tor any one 
of these recognition sequences which alter selectivity and specificity of pro-u chain 
association. 



According to a third aspect of the present invention there is provided a method 
of producing a desired procollagen or derivative thereof in a system which co- 
expresses and assembles at least one further procollagen or derivative thereof wherein 
the pro-a chains or derivatives thereof for assembly into the desired procollagen or 
derivative thereof are expressed with a C- terminal propeptide domain having a non- 
natural ("alien"*) recognition sequence whereby said pro-u chains or derivatives 
thereof of the desired procollagen will not co-assemble with pro-u chains of the at 
least one further procollagen or derivative thereof 

Preferred non-natural recognition sequences are those discussed above and 
those disclosed in PCT GB% 02122. 



In a preferred exogenous manipulation of a gene according to the methods ot 
the invention, the DNA encoding tor the recognition sequence ot the prou2( I ) chain 
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eene can be replaced with the corresponding DNA encoding for the recognition 
sequence of the proa 1( III) chain gene and this manipulated gene can be expressed and 
assembled to form procollagens which are proa2(l) homotrimers (instead of 
proa 1( III) homotrimers which would normally be formed from pro-a chains 
containing these recognition sequences). Thus according to the invention proa2(I) 
homotrimers derived from an exogenous source may be formed which do not co- 
assemble with proa2(I) chains endogenous to the cell in which expression occurs 
which have "natural" recognition sequences. 

In another preferred exogenous manipulation of a gene according to the 
methods of the invention, the manipulated gene encodes for a molecule comprising at 
least a first moiety having the activity of a procollagen C-propeptide (i.e. the C- 
terminal propeptide domain of a pro-a chain) and a second moiety selected from any 
one of an alien collagen a chain and non-collagen materials, the first moiety being 
attached to the second moiety. Genes which encode for a second moiety of a non- 
collagen material (such as those disclosed in PCT GB%<02122) are examples of pro- 
a chain derivatives according to the invention. 

Other preferred exogenous manipulations of a gene according to the methods 
of the invention involve the construction of gene constructs which encode for 
chimeric pro-a chains or derivatives thereof formed from the genetic code of at least 
two different pro-a chains. It is particularly preferred that the chimeric pro-a chains 
or derivatives thereof comprise a recognition sequence from the C- terminal 
propeptide domain of one type of pro-a chain and the a chain domain from another 
tvpe of pro-a chain. Preferred chimeric pro-a chains or derivatives thereof are formed 
from pro-a 1(1). pro-a2 (I), pro-a 1 (II). pro-al (III), pro-a 1 (V). pro-a2 (V). pro- 
al (XI) or pro-a2 (XI) pro-a chains. Most preferred pro-a chains for making 
chimeric pro-a chains or derivatives thereof are those which form collagens I and III 
particularly pro-a2 (I) and pro-al (III). Specific preferred chimeric pro-a chains or 
derivatives thereof are disclosed in the Example. 
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The methods of the invention enables the expression and assembly of any 
desired procollagen or derivative thereof in a system in which com entionaih there 
would be undesirable co-assembly or hybridisation of pro-u chains. The methods are 
particularly suitable for allowing the expression of procollagen or derivatives thereof 
from a wide variety of cell-lines or transgenic organisms without the problems 
associated with co-assembly with endogenously expressed pro-u chains. A preferred 
use of the methods of the invention is the production of recombinant procollagens in 
cell-lines. Examples of cell-lines which may be used are fibroblasts or cell lines 
derived therefrom. Baby Hamster kidney cells (BHK cells). Mouse cells. 
Chinese Hamster Ovary cells (CHO cells) and COS cells may be used. 

The methods of the invention are particularly useful as an improved means of 
production of any desired procollagen or derivatives thereof, particularly tor scaled up 
industrial production by biotechnological means. 

The method of the invention may also be useful for treatment by gene therapy 
of patients suffering from diseases such as osteogenesis imperfecta (Oh. some forms 
of Ehlers-Danlos syndrome (EDS) or certain forms of chondrodysplasia. In most 
cases the devastating effects of these diseases are due to substitutions of glycine 
within the triple helical domain, for amino acids with bulkier side chains in the pro-u 
chains. This substitution results in triple helix folding, during the formation of 
procollagen, being prevented or delayed with the consequence that there is a drastic 
reduction in the secretion of the procollagen. The nial folded proteins are retained 
within the cell, probably w ithin the endoplasmic reticulum, where they are degraded, 
furthermore, the folding of the C- terminal propeptide domain is not affected by these 
mutations within the triple helical domain, therefore C-terminal propeptide domains 
from normal as well as mutant chains may associate resulting in the retention ot 
normal and mutant pro-u chains within the cell. The retention and degradation of 
normal chains due to their interaction with mutant chains amplifies the effect ot the 



mu ,a,,on and has been termed T* — ' - »< ^ ** 

u , , m s phenomenon probably explains why such mutations ^ lethal Occts 
.dentine-ion by the inventors of .he recognition sequence wh.ch directs .he ,n,ua, 
association between P ro-« chains provides a targe, .or therapeutic ,n,rve„„on 
„ lovvin „ f„, .he ,nodu,a„o„ or inhibition of collagen deposition. Thus, .he me.hod o, 
, h e invention could be utilised as a gene therapy ,o transfer a copy of ,he w,,d- t y P e 
„ ene ,„ an individual with a mutation in .he ,r,ple heheal domain such ,ha, .he w,,d- 
; «ene is exogenous* manipulated .o code for a pr„-a chain ,„h a C terminal 
pro pep.,de domain .ha, vv,„ no, co-assemble with .he mu,an, pro-a chains The 
patien, is .hen able ,o secre.e au.hen.ic collagen chains ■„ cells expressing mutant 
chains. 

Exo.enouslv selected or exogenous* manipulated genes ,ha, express P ro-a 
chains or derivatives thereof ,n accordance with the invention may be assembled into 
lrl mers to form procollagen molecules or derivatives thereof, which in turn may be 
formed ,n,o collagen polymers following exposure to Procollagen C-Prote.nase an 

Procollagen N-Pro,e,nascs. The Pro.e.nases cleaves the C- and N- ,er a, 

propeptides from the procollagen molecules .o for onomers which aggregate 

spontaneously to form .he collagen polymers. 

The presen, invention will now be described, by way of example w,,h 
reference to the accompanying drawings, .n which: 

F^e I ,s a schematic representation of the stages in norma, procollagen 
assembly Ul and stages ,n procollagen assembly according to one embodiment o, the 

invention (B): . 

Fi „ ure . shows an alignment P ,o, of the C- terminal pro P e P .,de domains ol P ro- 

„ chains 'from type 1 and „. collagen. The alignment shows amino acids which are 
identical or those which are coaserved The conserved cy steine residues 



numbered 1-8, while letters A. B. (A F. G denote the-first amino acid at the junctions 
betw een proa 1 ( III ) chains and proa.2( I ) chains of the Example: 

Figure 3 is a schematic representation of the chimeric pro-u 1 chains described 
in the Example: 

Figure 4 is a photograph of an SDS-PAGE gel. illustrating disulphide bond 
formation among chimeric gene constructs in which the C-terminal propeptide domain 
were exchanged, with the following parental and chimeric molecules from the 
Example run in the indicated lanes of the gel: Proal (III)Al [al(III)]. proa2(I)Al 
[a2(D] (parental molecule) and proa2( I ):( UUCP [a2:CP]. proa 1 ( III ):< I)CP [al:CP] 
(hybrid chains), these molecules were expressed in a rabbit reticulocyte lysate in the 
presence of semi-permeabilized (SP) HT 1080 cells, after which the SP-cells were 
isolated by centrifugation. solubilized and the translation products separated by SDS- 
PAGE through a 7.5° o gel under reducing (lanes 1-4) or non-reducing conditions 
(lanes 5-8): 

Figure 5 is a photograph of an SDS-PAGE gel the lanes represent the effect of 
heat denaturation of proa2( 1 ):( III )CP triple-helix at the specified temperatures, the 
samples were prepared in the following manner: Proa2(I):(III)CP RNA was translated 
in the presence of SP-cells. after w hich the SP-cells were isolated by centrifugation. 
solubilized and treated with pepsin ( 100 ug ml), the reaction mixture was neutralized, 
diluted in chymotrypsin tr\ psin digest buffer and divided into aliquots. each aliquot 
being heated to a set temperature prior to digestion w ith a combination of trypsin ( 100 
ug ml) and chymotrypsin (250 ug ml), samples were analysed by SDS-PAGE through 
a 12.5°o gel under reducing conditions (lanes 1-10). Fane 11 (unt) contains 
translation products which have not been treated with proteases: 

Figure 6 is a photograph of an SDS-PAGE gel illustrating tnmerization and 
triple-helix formation among chimeric procollagen chains, samples were prepared 
from parental chains proa 1 ( III )A1 . proa2(I)Al which were made into hybrids 
proa2(I):(HI)CP. A.F.F S " ( . Proa 1 < III ):( I )C (a2CP. A.FJ ;S " ( .B s ' ( -G s " ( . alCh the 
hvbrids were translated in a rabbit reticulocyte lysate in the presence of SP-cells after 
which the SP-cells were isolated by centrifugation. solubilized and a portion of the 



14 



translated material separated by SDS-PAGE under nan-reducing conditions through a 
7.5% gel (lanes 1-9). 

Fiizure 7 is a photograph of an SDS-PAGE gel illustrating tnmenzation and 
triple-helix formation among chimeric procollagen chains, lanes show the remainder 
of the samples that were loaded on the gel of Fig 6 w hich were treated w ith pepsin 
( 100 ua ml) prior to neutralization and digestion with a combination of trypsin i 100 
mil and chymotrypsin (250 ug ml), the proteolytic digestion products were 
analysed by SDS-PAGE through a 12.5% gel under reducing conditions (lanes 1-9); 

Figure 8 is a photograph of an SDS-PAGE gel. illustrating trimerization and 
triple-helix formation among chains containing the 23 amino acid B-G motif, the 
lanes show recombinant procollagen chains procdt III iCP. proa2( I ):( III )CP and 
proa2(I):(III)BGR S ~ C which were expressed in a reticulocyte lysate supplemented with 
SP-cells. after which the SP-cells were isolated by centrifugation. solubilized and a 
portion of the translated material separated by SDS-PAGE through a 7.5° o gel. under 
reducing (lanes 1-3) of non-reducing conditions (lanes 4-5). 

Fisure 9 is a photograph of an SDS-PAGE gel. illustrating trimerization and 
triple-helix formation among chains containing the 23 ammo acid B-G motit. the 
lanes show the remainder of the samples that w ere loaded on the gel of Figure 9 w hich 
were treated with pepsin (100 ug ml) prior to neutralization and digestion with a 
combination of trypsin (100 ug ml) and chymotrypsin (200 ug ml), the proteolytic 
digestion products were analysed by SDS-PAGE through a 12.5 f \> gel under reducing 
conditions (lanes 1-3): 

Figure 10 is a photograph of an SDS-PAGE gel. illustrating the effect of Cys- 
Ser reversion and Leu-Met mutation on the assembly of prou2( I ):( III )BGR chains, the 
lane show recombinant procollagen chains proa2( I ):( III )BGR S ~ C prou2( I ):< III )BGR C 
s . proa2(I):(III)BGR l_m which were translated in a reticulocyte lysate supplemented 
with SP-cells after which the cells were isolated by centrifugation. solubilized and a 
portion of the translated material separated by SDS-PAGE through a 7.5% gel. under 
reducing (lanes 1-3) or non-reducing conditions (lanes 4-6); 



Figure 1 1 is a photograph of an SDS-PAGF illustrating the etYeet of Cys- 
Ser reversion and Leu-Met mutation on the assembly of pro</.2( I ):< I II >B( )R chains, the 
lane show the remainder of the samples that were loaded on the gel of f ig 10 which 
were treated with pepsin (100 ug ml) prior to neutralization and digestion with a 
combination of trypsin ( 101) ug ml) and a ehymoirypsin (250 ug ml), the proteolytic 
digestion products were analysed b_\ SDS-PAGF through a 12.5% gel under reducing 
conditions { lanes 1-5): 

Figure 12 is a photograph of an SDS-PAGF gel. illustrating inter-chain 
disulfide bonds from between prou2( I ):( IIDBGR C-terminal propeptide domains, the 
lanes show recombinant pro-a chains proul(III)Al and prou2< I ):( III )BGR which 
were translated in a reticulocyte lysate supplemented with SP-cells. The cells were 
isolated by centrifugation. solubilized and digested with 1.5 units of bacterial 
collagenase. The products of digestion were analysed by SDS-PAGF: through a 10% 
gel under reducing (lanes 2 and 5) or non-reducing (lanes 4 and 5) conditions: and 

Figure 15 is a schematic representation of sequence alignment of the chain 
selectivity recognition domains in other fibrillar procollagens, sequence homology 
within the 25 residue B-G motif is illustrated, the boxed regions indicating the position 
of the unique 15 residue sub-domain w hich directs pro-u chain discrimination. 

Figure 1 illustrates how procollagen is assembled in the endoplasmic reticulum 
of a cell. Normally assembly is initiated by type specific association of C-terminal 
propeptide domains of complimentary pro-u chains (1) to form procollagens (2). 
Procollagen is secreted from the cell in which it is synthesised and is then acted upon by 
Procollagen N Proteinases and Procollagen C Proteinases which cleave the N-terminal 
propeptide and C-terminal propeptide respectively to yield collagen molecules (5). 
Collagen molecules may then spontaneously aggregate to form collagen fibrils. Pro-u 
chains with non-complimentary C-terminal propeptide domains (4) do not associate 
and form procollagens. When exogenous pro-u chains (5) are introduced into a cell 
thev may co-assemble with endogenous pro-u chains (0) which have complimentry C- 
terminal propeptide domains to form undesirable hybrids (7). According to the methods 



of the invention exogenously manipulated pro-a ctujins (8) are generated with C- 
terminal propeptide domains that are no longer complimentary to the C-terminal 
propeptide domains of the endogenous pro-a chains (6) such that the exogenously 
manipulated pro-a chains (8) may form procollagens (>)) and subsequently collagen 
molecules (10) without co-assembly with endogenous pro-a chains (6) occurring. 



FX AMPLE 

The inventors generated DNA molecules according to the second aspect ot the 
invention from which they expressed pro-u chains with altered selectivity tor pro-«x 
cha.n assembly. Experimental strategy was based on the assumption that transfer of 
C- terminal propeptide domains (or sequences within the C-propeptide) from the 
homotrimeric proa 1(111. chain to the proa2(l> molecule would be sufficient to direct 
self-association and assembly into homotrimers of P roa2(l>. The inventors 
reconstituted the initial stages in the assembly of procollagen by expressing specific 
RNAs in a cell-free translation system in the presence of semi-permeabilized cells 
known to cam out the co- and post-translat,onal modification required to ensure 
assembly of a correctly aligned triple helix. By analysing the folding and assembly 
pattern of procollagens formed from a series of chimeric pro-a chains in which 
specific regions of the C-terminal propeptide domain of proal (HI) were exchanged 
w.th the corresponding region within the P roa2(l) chain (and vice xersa) the inventors 
identified a short discontinuous sequence of 15 amino acids within the proal (111) C- 
propept.de which directs procollagen self-association. This sequence is. therefore, 
responsible for the initial recognition event and is necessarv to ensure selective chain 
association. 

1. MATERIALS AND METHODS 

1.1 Construction of recombinant plasmids 

paKIIDAl and P a2(l)Al are recombinant pro-a chains with truncated a chain 
domams which have been described previously (see Lees and Bulled ,1W, ... Biol. 
Chem 26Q p24354-243601 W). Chimaenc molecules were generated by PCR 
overlap extension using the principles outlined by Morton .W) Methods in 
Molecular Biology Vol 15. Chapter 25. Humana Press Inc.. Totowa. NJ. PCRs 
(1 «)Oul) compromised template DNA ,500 ng,. oligonucleotide primers < 100 pmol 
each, in 10 mM KC1. 20 mM Tr.s-HCI P H 8.8. 10mM tNH 4 »:SO,. 2 mM MuSO., 
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^ inn -nn e-irh dNTP Terwounds of amplification were 
0.1° o (v'v) Triton X-100. ^00 liM ^acn a.\n. 

i-i ^ ,r 1 nnir Vent DNA polymerase (New England Biolabs. 

performed in the presence ot 1 unit \uit u.n.v r - . 

.T.u.Mihrp a p s^. C"" were generated using a y 
MA) Recombinants poc_lI)Al .( Line, t . a. r. s 

oli,onucleo,,de prime, .5 •AGATOOTCOCACTGGACATC y, complementary to a 
sequence 70 bp upstream of an »i/ sue in pa2,l )A1 and a V ohgonucleoude pnmer , > 
TCGCAGGGATCCGTCCGTCACTTGCACTGOTT 3', complementary to a reg.on 
,00 bp dovvnsrreanr to the stop codon m pah.IUAl. A siu- was mtroduced 

into thts primer to fac,l,.a.e subsequent sub-clonmg steps. Pairs o, tntemul 
ohsonucleondes. of wh.ch one included a 20 nucleot.de overlap, were destined to 
„enera.e molecules with precise junctions as deltneated (see Ftgs : and 3, Overlap 
extenston vielded a product of -<*0 bp which was purified, digested wtth A/„„- 
S „„,HI and liga.ed tnto pa2U,Al from which a ,080 bp AM-&™HI fragment had 

u- WITH \1 inCP C were svnthesized in a similar 

been excised. Recombinants pal(IinAl.U)Lr^ 

manner usin. a , oligonucleotide ,5' AATGGAGCTCCTGGACCC ATG 3'1 
complement to a sequence 100b P upstream of an AM sue in a po-IHUI and a , 
ampliation pnmer ,5 ■CTGCTACGTACCAAATGGAAGGA TTC AGCT r f y, 
which tncorporated a *rt sue and was complementary ,o a teuton I00bp 
downstream of the stop codon to pc.2U.Al. Overlap extension produced a lament 
of 1100 bp which was digested w„h A*»l and K,M and hgated nuo paUl.l.A Iron, 
vvhtch an I860 bp fragment had been removed. Recombinant P «2,l ,:,m.BGR was 
constructed using the same ampliation prtme, used to synthesize the 
proa-imUlU) senes of chimeras and a y oligonucleotide which was identical to 
that used to eenerate the proa 1,1,1 ,A1 hCP.C construes except that i, contanted a 
a „„H, stte'tnstead of -hod, complementary to pu2 l „A, 1 . Primary 

amphficanon products vvete generated from pa2, 1,A1 :, 1,I, B - and p„2„,A, w„h 
,»„, oltgonucleotides determining the junction. Overlap extension produced a 
fa-mem vvhtch was digested with M and ft»»HI and ligated into pa2,l,A,. Site- 

4 -^ntinllv -is described bv kunkel a ul. (Kunkel 
directed mutagenesis was performed essentialh as d.suioeu . 

, ,-i i.si~>\ evcent that extension reactions 

ct al. (1987) Methods in Enzymol. 1?4 p except 



were performed in the presence of 1 unit 14 DNA puhmerasc and 1 ug "14 gene 32 
protein (Boehringer. Lewes. IK). 

1.2 Transcription in vitro 

Transcription reactions were carried out as described by Ciurevich c7 ul ( WS7) 
isee Gurevich et ul ) Anal. Biochem. l c >5 p207-213i . Recombinant plasmids 

p<xl(IlI)Al. pal(III)Al:<I)CP.C and p«2(l)Al. P u2( l)Al :( III )CT\ A. V. P"\ IV\ C ^ 
(IOug) were linearized and transcribed using T3 RNA polymerase, or T7 RNA 
polymerase (Promega. Southampton. I K) respectively. Reactions (100 ul) were 
incubated at 37°C for 4 h. Following purification over RNeasy columns (Qiagen. 
Dorking. UK). RNA was resuspended in 100 ul RNasefree water containing 1 mM 
DTT and 40 units RNasin (Promega. Southampton. L'K). 



1.3 Translation in vitro 

RNA was translated using a rabbit reticulocyte lysate ( Fle\iL> sate. Promega. 
Southampton) for 2 hours at 30 2 C in the absence of exogenous DTT. The translation 
reaction (25ul) contained 17ul reticulocyte lysate. 1 ul 1 mM amino acids (minus 
methionine). 0.45 ul lOOmM KCT. 0.25 ul ascorbic acid (5 mg ml). 15 ufi [i - 
;; S]methionine (Amersham International. Bucks. I K). 1 ul transcribed RNA and 1 ul 
(-2 x 1() 5 ) semi-permeabilized cells (SP-cells) prepared as described by W ilson ct ul. 
(1995) Biochem. J. 307 P 679-687. After translation. .Y-ethylmaleimide was added to 
a final concentration of 20 mM. SP-cells were isolated by centr.fugation in a 
micro fuge at 10000 g for 5min and the pellet resuspended in an appropriate buffer for 
subsequent enzymic digestion or gel electrophoresis. 



1.4 Bacterial collagenase digestion 

SP-cells were resuspended in 50 mM Tris-HCI pll 7 .4 containing 5 
CaCU. 1 mM phenylmethanesulfo.n l tlounde (PMSF). 5mM \ -ethylmale.mide 
1% (\ v) Triton X-l(H) and incubated with 3 units collagenase form HI (Adv. 



i t , ... -; 7 o r for iH" The reaction was terminated 
Biofacture. Lynbrook. NJ) and incubated at ,7 C tor In. 

by the addiuon of SDS-PAGE sample buffer. 

1.5 Proteolytic digestion 

Iso lated SP-cells were suspended in 0.5% <v/v, acetic acid. 1 . , .1" 
X-100 and tncubated with pepsin ,100 ug ml, tor 2 h at 20-C or 16 h at 4 C. I he 
reacUons were popped by neutra„ Z ation w,th Tns-base ,100 mM, Samples were 
then nested whh a combination of ehymotrypsm (250 ^ - t, P sm 0 
^HSiuma. Poole. Dorset. L'K) tor . mtn at room temperature in the presence ot 
' g . • n^MViCl lOmMEDTA. The reactions 

^0 mM Tris-HCI P H 7.4 containing 0.1, M NaCl. 

,-ere stopped bv the addition of soy bean trypsin inhibitor ,S, g n, Poole. Dor., I k) 
rial concentration of 500 w ml and boihng SDS-PAGE loading buffer. Samples 
were then boiled for 5 mm. 

1.6 Thermal denaturation 

Pepsin.rea.ed sables were -expended in 50 mM Tns-HC, pH 7.4 

I^e empire g rad,en. « np . ~ "» 

heid J : .in « -rvais. ,, *e end of eacb onre penod rbe sa.pie was 
„ea,ed with a combinauon o, cbymorrypsin. as described above. 

1. 7 SDS-PAGE . , rl tj 

Sampies resnspended in SDS-PAGE loadin, buffer ,«.««> M Tr,,HU pH 
6 , SDS r% w,v, dvceroi ,10% v v, and Bronropneno, Biue, in .be presence or 
6.8. bDb . . performed using the 

absence of ^0 mM DTT and boiled tor , mm. SDS-PAGb xsa. p 
absence ot-u n6S0 . 6 8^ After electrophoresis, gels were 

method of Laemmli 0970, Nature __7 p680 68 ^ 
processed for autoradiography and exposed to Kodak X-Omat AR 
quantified by phosphoimage analysis. 



2. RESULTS 



2.1 Transfer of the print I (III) C-propeptide to the proa(I)2 chain is sufficient to 
direct self-assembly. 

Experimental strategy was based on the assumption that transfer of the C- 
termmal propeptide domain from the proa 1( III) chain to the proa2(I) chain should be 
sufficient to direct self-recognition and assembly into homotnmers. Hence, by 
exchanging different regions within the proa 1( III) C- terminal propeptide domain 
with the corresponding sequence from the proa2(I) chain the intention was to 
distinguish between sequences that direct the folding of tertiar\ structure and those 
involved in the selection (i.e. recognition of pro-a chains) process. To simplify 
analvsis of the translation products chimeric procollagen molecules were constructed 
from two parental procollagen 'mini-chains', proa 1( III )A1 and proad)Al. These 
molecules, which have been described previously (Lees and Bulleid. 1994). comprise 
both the N- and C- terminal propeptides domains together with truncated triple-helical 
domains. The initial assumption was tested by analysing the folding and assembly of 
chimeric procollagen chains in which the C-terminal propeptide domain of the 
proa2(I) chain was substituted with the equivalent domain from the proa It III )A1 
chain (proa2-(I):(III)CP) and. conversely, where the C-propeptide of proa 1( III) chain 
was replaced with that from proa2(I)Al chain (proa 1 ( III ):< I )CP ) (see Tigs 2 and 3). 
The C-propeptide (CP) junction points were determined by the sites of cleavage by 
the procollagen C-proteinase (PCP) which is known to occur between Ala and Asp 
(residues 1 1 19-1 120) in the prou2(I) chain (Kessler < 1996) Science 271 p360-.VO). In 
the absence of data regarding the precise location of cleavage within the prou(III) 
chain, the inventors chose to position the junction between Ala and Pro (residues 
1217-1218). However. Kessler and co-workers { 1996) have subsequently shown that 
cleavage by PCP occurs between Cdy and Asp (residues 1222-1223). with the 
consequence that recombinant proa2( 1 ):( III )CP includes an additional four residues 
derived from the proa(III) C-telopeptide. whilst the C-telopeptide in construct 
proa 1 ( III ):( I )CP is missing those same four amino acids. RNA transcripts w ere 



transcribed in vitro and expressed in a cell-tree .system comprising a rabbit 
reticulocyte lysate optimized for the formation of disulfide bonds supplemented with 
semi-permeabilized HT 1080 cells ,SP-cellsh which has been shown previously to 
carrv out the initial stages in the folding, post-translational modification and assembly 
of procollagen (Bulleid et ul.. (19%, Biochem. J. 317 P 195-202>. The C-terminal 
propeptide domains of both proa 1( III) and proa2(I> chains contain cysteine residues 
which participate in the formation of interchain disulfide bonds. Translation products 
were, therefore, separated by SDS-PAGE under reduced and non-reduced conditions 
in order to detect disulfide-bonded trimers. Translation of the parental molecules 
proaKIIDAl and pro«2(I>Al yielded major products of -77 kDa and 61 kDa 
respectively (Figure 4. lanes 1 and 2). the size differential being accounted for by the 
relative molecular weights of the N-propeptides and truncated triple-helical domains 
in each molecule (Lees and Bulleid. 19941 The heterogeneity of the translation 
products is due to hydroxylation of proline residues in the triple-helical domain that 
,eads to an alteration in electrophoretic mobility (Cheah ,/ ai. (1979) Biochem. 
Biophvs. Res. Comm. 91 p!025-1031). The additional lower molecular weight 
proteins present in lanes 3 and 7 probably represent translation products obtained after 
initiation of translation at internal start codons. We have previously shown that these 
minor translation products are not translocated into the endoplasmic reticulum (Lees 
and Bulleid. 1994). The presence of high molecular weight species under non- 
reducing conditions but not reducing conditions is indicative of interchain disulfide 
bond formation. Separation under non-reduced conditions revealed that proa 1 ( I1DA1 . 
but not proa(I)Al. chains were able to self-associate to form disulfide-bonded trimers 
(Fiuure 4. lanes 5 and 6). A similar examination of chimeric chains proa2(I):.ni)CP 
and proal<IIIV.(I)CP revealed that only proa2(l):(Hl)CP chains were able to form 
disulfide-bonded homotrimers (Figure 4. lanes 3. 4. 7 and 8) demonstrating that the C- 
propepude from type 111 procollagen is both necessary and sufficient to drive the 
initial association between procollagen chains. 



It has been shown previously that proa 1( ILL )Al chains s\nthesised in the 
presence of SP-cells were resistant to a combination of pepsin, clnmotrypsin and 
trypsin in a standard assay used specifically to detect triple-helical procollagen 
(Bulleid ct aL. 19%). 1 he inventors confirmed that proa2< I ):< III )CP chains had the 
abilitv to form a correctly aligned triple-helix by performing a thermal denaturation 
experiment in which translated material was heated to \anous temperatures prior to 
protease treatment (Figure 5). The results indicate that at temperatures below 35°C a 
protease-resistent triple-helical fragment is present, but at temperatures above 35 C C 
the triple-helix melts and becomes protease sensitive (Figure 5. lanes 1-10). The 
melting temperature (T :n ) was calculated to be -35.5 C C after quantification by 
phophorimage analysis. The 7 m value obtained for proa2-( I ):( III )CP is significantly 
lower than the figure of 3 C ).5°C obtained for procx 1 (III )AI (Bulleid ct aL. 1^%) and 
probably reflects the percentage of hydrox\ proline residues relative to the total 
number of amino acids in the triple-helical domain (11% and 15% respectively). 
These results indicate that transfer of the proa(III) C-propeptide enables the inventors 
to generate an entirely novel procollagen species comprising three proa2(I) chains 
that fold into a correctly aligned triple-helix. 

2.2 Assembly of recombinant procollagen chains with chimeric C -propeptides. 

Given that the proa2( I ):( III )CP hybrid pra-u chain includes all of the 
information required for self-association we reasoned that progressive removal ol the 
proa 1 (III) C-propeptide sequence and replacement with the corresponding pro</2tI) 
sequence would eventually disrupt the chain selection mechanism. Conversely, it is 
anticipated that transfer or progressively more procx 1(111) C-terminal propeptide 
domain sequence to the proa 1 ( III):( I )CP chimeric chain would yield a molecule 
which was capable of self-assembly. A series of procollagen chains with chimeric C- 
terminal propeptide domains was constructed and the ability of individual chains to 
form homotrimers with stable triple-helical domains was assessed. A schematic 
representation of these recombinants is presented in Figure 2. w ith the letters A. B. C\ 
F and G denoting the position of each junction. It should be noted that the proa 1( III) 
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and pro«2(I> C-propeptides differ in then complement of cysteine residues, w.th 
proa2(I) lacking the Cys2 residue. Our previous data suggest that .ntercha.n d.sult.de 
bond within the C- P ro P epude of type III procollagen form exclusively between Cys2 
and ^ (Lees and Bulleid. 1994). However, interchain disulfide bonding, between 
either the C-termmal propeptide domains to C-telopeptides is not required for chain 
association and triple-helix formation (Bulleid c, aL 1996). therefore, it is possible 
that homotrimers may form between chimeric pro-a chains which lack either the C- 
terminal propeptide domain Cys2 residue or the C-telopept,de cysteine [only tound in 
the triple-helical domain of proa 1. III)]. These molecules will not. however, contain 
.ntercham disulfide bonds and. as a consequence will not appear as oligomers alter 
analvsis under non-reducing conditions. To circumvent this problem, where 
appropriate, the inventors generated their hybrid chains from a recombinant 
proaWl" (Lees and Bulleid. 1994) in which the existing serine residue was 
substituted for cvsteme. thus restoring the potential to form trimers stabilized by 
interchain disulfide bonds. It should also be noted that whilst proal(IIlV.(I)CP lacks 
Cvs- .t does still retain the potential to form disulfide-bonded trimers by v.rtue ot the 
tvvo cvsteme residues located at the junction of the triple-helical domain and the C- 
telopeptide. Parental chains P roa2(IUl and hybrids P roa2.I):iHI>CP. A. K F - B - 
proal(III):(I)C were translated in the presence of SP-ce.ls and the products 
separated bv SDS-PAGE under non-reducing conditions (Figure 6). The results 
demonstrate that recombinants proa U III Ul. P roa2(I),IinCP. A. F~ B~ (Figure 6. 
,anes 1 3 4 6 and 7) are able to form interchain disulfide-bonded trimers and d.mers 
while proaKIIDAl. pro«2(I),III)F. C~ and proaWIIuDC (Figure (, lanes 2. 8 
and 9) remain monomenc. We have already demonstrated that interchain d.su.tide 
bondin* is not a prerequisite for triple-helix formation (Bulled «L 1996). therefore, 
the inabilitv to form disulfide-bonded trimers does not preclude the possibility that the 
molecules "assemble to form a triple-helix. To ascertain whether the chimeric cha.ns 
had the abilitv to fold into a correctly aligned triple-helix, we treated translation 
products with a combination of pepsin, chymotrypsm and trypsin and analysed the 
tested material under reducing conditions by SDS -PAGE. As shown in Figure 7. 



recombinants proaMIIDAl. prou2, 1 111 >CP. A. r\ FtB " I Figure 7. lanes 1. 3. 4. 5. 
o and 7) all yielded protease-resistant fragments. The size differential reflects the 
relative lengths of the triple-helical domains in each of the parental molecules 
[proa2(I)Al-185 residues and proal . Ill Ul - 1 ^2 residues]. The ability of 
pro«2(I):(III)F to form a stable triple-helix confirms that interchain disulfide bonding 
is not necessary for triple-helix folding. Thus, hybrid molecules containing sequences 
from the P roa2 C-terminal propeptide domains between the propeptide cleavage site 
and the B-junction are able to form homotnmers with stable triple-helical domains 
and. therefore, contain all of the information necessary to direct chain self-assembly. 
These results md.cate that the signal. s) which controls chain selectivity must be 
located between the B-,unction and the C-terminus of the C-pro P eptide. Neither 
proa2(I):(HI)C- nor proa Hill):! DC chains are able to fold into a triple helix. The 
inability of these reciprocal constructs to self-associate suggests that chain selectivity 
is mediated, either by a co-linear sequence that spans the function or by 
discontinuous sequence domains located on either side of the C -.junction. 

2.3 Identification of a sequence motif from the proa WW C-propeptide n-hich 

directs chain self-assembly 

Procollagen chain selectivity is probably mediated through one or more of the 
variable domains located within the C-terminal propeptide domain. The sequence 
between the B- and functions is one of the least conserved among the procollagen 
C-propeptides (Figure 2). yet to inventors have demonstrated that inclusion of this 
domain, in the absence of pro,, 1(111, sequence distal to the function, is not 
sufficient to direct chain assembly. To ascertain whether the recognition sequence tor 
chain recognition had indeed been interrupted a further recombinant. 
proa2(I):(III)BGR- (B-G replacement, was generated, which contained all of the 
proadUl sequence apart from the Set ->C>s mutation at C>s2 and a stretch of 23 
amino acids derived from the type HI C- P ro P eptide which spans the C-junct.on trom 
po.nts B to G. the B-G motif: ^GSm^DXlJ^QLM ^A ^ (underscoring 
indicates the most divergent residues, see Figure 2,. The location of the G-boundary 
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in the replacement motif allowed tor the inclusion of tHe first non-conserved residues 
after the C-junction <SR). When expressed in the presence of SP-cells the chimeric 
P roa2(I):(lII)BGR^ chains were able to form inter-chain disultlde-bonded molecules 
(Ficure 8. lane 6) demonstrating that the C-termmal propeptide domains were capable 
of self-association. Furthermore, this hybrid was able to fold and form a stable triple- 
helix as judged by the formation of a protease-resistant fragment (Figure 9. lane 3) 
Proa2(I):I):(IH>BGR- contains a Scr->C>s substitution which enabled the inventors 
to assav for the formation of disulfide-bonded miners. Previous data demonstrated 
that this substitution alone does not enable wild-type pro«2<lUl claims to form 
homotrimers (Lees and Bulleid. 1994). Nevertheless, to eliminate the possibility that 
this mutation influences the assembly pattern a revertant proa( I ):( III iBGFC" which 
contains the wild-type complement of Cys residues was created. As expected 
proa2(I):(IH)BGR c - s was unable to form disulfide-bonded trimers (Figure 10. lane M 
but did assemble correctly into a protease-resistant triple helix (Figure 11. lane 3). 
Thus, the 23-residue B-G motif contains all of the information required to direct 
procollagen self-assembly. 

The ability of the proa2(l):(III)BGR- chains to form interchain disulfide 
bonds suenests that this molecules is able to associate via its C-propeptide. However, 
to confirm that this is indeed the case the inventors carried out a collagenase digestion 
of the products of the translation (Figure 12). Bacterial collagenase specifically 
digests the triple-helical domain, leas ing both the V and C- propeptides intact. Ihe 
N-propeptides of both chains do not contain any methionine residues and as a 
consequence, the onh radio labelled product remaining after digestion is the C- 
propeptide. Comparison of the samples separated under reducing and non-reducing 
conditions demonstrated that inter-chain disulfide-bonded trimers were formed within 
the C-terminal propeptide domains of proaKHDAl and pro«2, 1), I1DBGR- chains 
I Figure 12. lanes 2 and 4. and 3 and 5). This demonstrates that these chains do indeed 
associate via their C-terminal propeptide domains. 



2.4 The effect of Leu— * Met substitution on proa2(IiiBGR assembly 

Analysis of the 23 amino acid E3-G motif from the proal(III) and proa2(I) 
chains (Figure 13) indicates that residues 13-20 (QLAFLRLL) are identical with the 
exception of position 17. Leu (L) in proa U III) and Met (M) in proa2(I). I sing site- 
directed mutagenesis the inventors substituted the existing Leu residue with Met to 
create proa2( I ):( III )BGR'" m and monitored the effect of this mutation on chain 
assemblv. The Leu— *Met mutagenesis was performed using recombinant 
proalDidlljBGR^ and proa2( I ):( IIDBGR 1 " 111 and were able to form interchain 
disulfide-bonded molecules when analysed under non-reducing conditions (Figure 10, 
lanes 4 and 6). Both constructs formed protease-resistant triple-helical domains 
(Figure 1 1, lanes 1 and 3). The Leu— >Met substitution did not. therefore, disrupt the 
process of chain selection nor did it prevent the formation of a correctly aligned triple- 
helix. These observations lead to the conclusion that a discontinuous sequence of 15 

amino acids: (GNPELPEDVLDV SSR) contains all of the information necessary 

to allow procollagen chains to discriminate between each other and assemble in a 
type-specific manner. 

3. DISCISSION 

The molecular mechanism which enables closely related procollagen chains to 
discriminate between each other is a central feature of the assembly pathway. The 
initial interaction between the C-terminal propeptide domains both ensures that the 
constituent chains are correctly aligned prior to nucleation of the triple-helix and 
propagation in a C- to N- direction, and that component chains associate in a collagen 
tvpe-specific manner. As a consequence, recognition signals which determine chain 
selectivity are assumed to reside within the primary sequence of this domain, 
presumably within a region(s) of genetic diversity. By generating chimeric 
procollagen molecules from parental 'mini-chains' proa 1( III )Al and proa2(I)Al the 
inventors have demonstrated that transfer of the proa 1( III) C-terminal propeptide 
domain to the naturally hetrotnmeric proa2(I) molecule was sufficient to direct 
formation of homotrimers. Furthermore, analysis of a series of molecules in which 



28 

specific sequences were interchanged from proa-HIII ) and prou2i I ) C-terminal 
propeptide domains allowed the inventors to identity a discontinuous sequence ot 1 > 

amino acids (GNPELPEDVLDV SSR) within the proa 1( III) C-propeptide. which. 

if transferred to the corresponding region within the proal(III) recognition motif to 
the proa2(I) chain did not appear to have an adverse effect on chain alignment, 
allowing the triple-helical domains to fold into a protease-resistant confirmation. This 
sequence motif is. therefore, both necessary and sufficient to ensure that procollagen 
chains discriminate between each other and assemble in a type-specific manner. 

In order to establish a structure-function relationship for the chain recognition 
domain, the inventors examined the hydropathy profile and secondary structure 
potential of the 23-residue B-G sequence : GNPELPEDVLDVQL AFLRLLSSR. The 
data indicate that the 15-residue chain recognition motif: GNPELPEDVLDV... .SSR is 
markedly hydrophilic. in contrast to the hydrophobic properties of the conserved 
region: QLAFLRLLL. These features are entirely consistent with a potential role for 
this motif in mediating the initial association between the component procollagen 
monomers. An examination of the 15-residue recognition motif from other fibrillar 
procollagens predicts that they are all relatively hyrophilic and probably assume a 
similar structural conformation, regardless of the degree of diversity in the primary 
sequence (Figure 13). It is. presumably, the nature of the amino acids changes which 
provides the distinguishing topographical features necessary to ensure differential 
chain association. An examination of the B-G sequence alignment (Figure 13) 
indicates that residues 1.2. 12 and 21 are more tightly conserved that amino acids 3- 
1 1. 22 and 23, suggesting that the latter may form a core recognition sequence that is 
of critical importance in the selection process. We do not know whether the other 
four residues participate directly in chain discrimination but this can be tested 
experimentally by site-directed mutagenesis. 

The inventors have identified the functional domain which determines chain 
selectivity and show that tnmerization is initiated via an interaction! s) between these 



identified recognition sequences. It is unclear, however, whether the interactions 
which determine chain composition are the same as those winch allow productive 
association and stabilization of the tnmer. The nature of potential stabilizing 
interactions is uncertain, but recent data (Bulleid et al.. 1996) indicate that, for type III 
procollagen at least, the formation of interchain disulfide bonds does not play a direct 
role in procollagen assembly. It has also been postulated that a cluster of four 
aromatic residues, which are conserved in the fibrillar collagens. eollagens X. YIII 
and collagen like complement factor Clq, may be of strategic importance in 
trimerization. 

The C-telopeptides were originally proposed to have a role in both procollagen 
assembly and in chain discrimination, the latter by virtue of the level of sequence 
diversity between various procollagen chains. However, the inventors have recently 
demonstrated (Bulleid et ai. 1996) that the C-telopeptides of ty pe III collagen do not 
interact prior to nucleation of the triple-helix, ruling out a role tor this peptide 
sequence in the initial association of the C-propeptides. Data obtained from the 
assembly of hybrid chains indicates that the ability to discriminate between chains 
does not segregate with the species of C-telopeptide. lending support to this assertion. 

l/sing this approach the inventors have been able to synthesize an entirely 
novel procollagen species compromising three pro<x2(I)Al chains [pror/.2( 1 )A1 ] 
Throughout this study procollagen 'mini-chains' with truncated triple-helical domains 
were used; however, the inventors have also demonstrated that full-length prou2(I) 
chains containing the 15-residue prou 1(111) recognition sequence also self-associate 
into a triple-helical conformation (data not shown). Thus, the ability to introduce the 
chain recognition sequence into different pro-u chains provides the means to design 
novel collagen molecules with defined chain compositions. This, in turn, introduces 
the possibility of producing collagen matrices w ith defined biological properties, such 
as enhanced or differential cell-binding or adhesion properties, f urthermore, the 
identification of a short peptide sequence w hich directs the initial association between 
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modulation or inhibi.ion of collagen deposmon. 
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